AITopics

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Neural Information Processing SystemsFeb-17-2026, 19:01:14 GMT

CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search

An efficient two-staged approach is designed for exploring CSPG, with fast approaching and cross-partition expansion.

information retrieval, machine learning, natural language, (18 more...)

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)

Neural Information Processing SystemsOct-10-2025, 14:51:26 GMT

CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search

An efficient two-staged approach is designed for exploring CSPG, with fast approaching and cross-partition expansion.

graph, proximity graph, vector, (14 more...)

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)

arXiv.org Artificial IntelligenceSep-10-2025

Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval

Xu, Haike, Chen, Tong

The widely used retrieve-and-rerank pipeline faces two critical limitations: they are constrained by the initial retrieval quality of the top-k documents, and the growing computational demands of LLM-based rerankers restrict the number of documents that can be effectively processed. We introduce Reranker-Guided-Search (RGS), a novel approach that bypasses these limitations by directly retrieving documents according to reranker preferences rather than following the traditional sequential reranking method. Our method uses a greedy search on proximity graphs generated by approximate nearest neighbor algorithms, strategically prioritizing promising documents for reranking based on document similarity. Experimental results demonstrate substantial performance improvements across multiple benchmarks: 3.5 points on BRIGHT, 2.9 on FollowIR, and 5.1 on M-BEIR, all within a constrained reranker budget of 100 documents. Our analysis suggests that, given a fixed pair of embedding and reranker models, strategically selecting documents to rerank can significantly improve retrieval accuracy under limited reranker budget.

large language model, machine learning, natural language, (15 more...)

2509.07163

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)

Nasim, Imran, Weber, Melanie

Automated Manifold Learning for Reduced Order Modeling

arXiv.org Artificial IntelligenceJun-3-2025

The problem of identifying geometric structure in data is a cornerstone of (unsupervised) learning. As a result, Geometric Representation Learning has been widely applied across scientific and engineering domains. In this work, we investigate the use of Geometric Representation Learning for the data-driven discovery of system dynamics from spatial-temporal data. We propose to encode similarity structure in such data in a spatial-temporal proximity graph, to which we apply a range of classical and deep learning-based manifold learning approaches to learn reduced order dynamics. We observe that while manifold learning is generally capable of recovering reduced order dynamics, the quality of the learned representations varies substantially across different algorithms and hyperparameter choices. This is indicative of high sensitivity to the inherent geometric assumptions of the respective approaches and suggests a need for careful hyperparameter tuning, which can be expensive in practise. To overcome these challenges, we propose a framework for Automated Manifold Learning, which selects a manifold learning approach and corresponding hyperparameter choices based on representative subsamples of the input graph. We demonstrate that the proposed framework leads to performance gains both in scalability and in the learned representations' accuracy in capturing local and global geometric features of the underlying system dynamics.

artificial intelligence, machine learning, manifold, (16 more...)

2506.01741

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMar-15-2024, 01:42:24 GMT

Data Skeletonization via Reeb Graphs

Recovering hidden structure from complex and noisy non-linear data is one of the most fundamental problems in machine learning and statistical inference. While such data is often high-dimensional, it is of interest to approximate it with a lowdimensional or even one-dimensional space, since many important aspects of data are often intrinsically low-dimensional. Furthermore, there are many scenarios where the underlying structure is graph-like, e.g, river/road networks or various trajectories. In this paper, we develop a framework to extract, as well as to simplify, a one-dimensional "skeleton" from unorganized data using the Reeb graph. Our algorithm is very simple, does not require complex optimizations and can be easily applied to unorganized high-dimensional data such as point clouds or proximity graphs. It can also represent arbitrary graph structures in the data. We also give theoretical results to justify our method. We provide a number of experiments to demonstrate the effectiveness and generality of our algorithm, including comparisons to existing methods, such as principal curves. We believe that the simplicity and practicality of our algorithm will help to promote skeleton graphs as a data analysis tool for a broad range of applications.

algorithm, graph, reeb graph, (16 more...)

Country: North America > United States > Ohio (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Data Science (0.89)

Chabert-Liddell, Saint-Clair, Bez, Nicolas, Gloaguen, Pierre, Donnet, Sophie, Mahévas, Stéphanie

Auto-encoding GPS data to reveal individual and collective behaviour

arXiv.org Artificial IntelligenceDec-1-2023

We propose an innovative and generic methodology to analyse individual and collective behaviour through individual trajectory data. The work is motivated by the analysis of GPS trajectories of fishing vessels collected from regulatory tracking data in the context of marine biodiversity conservation and ecosystem-based fisheries management. We build a low-dimensional latent representation of trajectories using convolutional neural networks as non-linear mapping. This is done by training a conditional variational auto-encoder taking into account covariates. The posterior distributions of the latent representations can be linked to the characteristics of the actual trajectories. The latent distributions of the trajectories are compared with the Bhattacharyya coefficient, which is well-suited for comparing distributions. Using this coefficient, we analyse the variation of the individual behaviour of each vessel during time. For collective behaviour analysis, we build proximity graphs and use an extension of the stochastic block model for multiple networks. This model results in a clustering of the individuals based on their set of trajectories. The application to French fishing vessels enables us to obtain groups of vessels whose individual and collective behaviours exhibit spatio-temporal patterns over the period 2014-2018.

artificial intelligence, machine learning, trajectory, (18 more...)

2312.00456

Country:

Atlantic Ocean > North Atlantic Ocean > Bay of Biscay (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Food & Agriculture > Fishing (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-23-2023

Coupled-Space Attacks against Random-Walk-based Anomaly Detection

Lai, Yuni, Waniek, Marcin, Li, Liying, Wu, Jingwen, Zhu, Yulin, Michalak, Tomasz P., Rahwan, Talal, Zhou, Kai

Random Walks-based Anomaly Detection (RWAD) is commonly used to identify anomalous patterns in various applications. An intriguing characteristic of RWAD is that the input graph can either be pre-existing or constructed from raw features. Consequently, there are two potential attack surfaces against RWAD: graph-space attacks and feature-space attacks. In this paper, we explore this vulnerability by designing practical coupled-space attacks, investigating the interplay between graph-space and feature-space attacks. To this end, we conduct a thorough complexity analysis, proving that attacking RWAD is NP-hard. Then, we proceed to formulate the graph-space attack as a bi-level optimization problem and propose two strategies to solve it: alternative iteration (alterI-attack) or utilizing the closed-form solution of the random walk model (cf-attack). Finally, we utilize the results from the graph-space attacks as guidance to design more powerful feature-space attacks (i.e., graph-guided attacks). Comprehensive experiments demonstrate that our proposed attacks are effective in enabling the target nodes from RWAD with a limited attack budget. In addition, we conduct transfer attack experiments in a black-box setting, which show that our feature attack significantly decreases the anomaly scores of target nodes. Our study opens the door to studying the coupled-space attack against graph anomaly detection in which the graph space relies on the feature space.

feature-space attack, graph, node, (16 more...)

2307.14387

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > India > Telangana > Hyderabad (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kulkarni, Hrishikesh, MacAvaney, Sean, Goharian, Nazli, Frieder, Ophir

Lexically-Accelerated Dense Retrieval

arXiv.org Artificial IntelligenceJul-31-2023

Retrieval approaches that score documents based on learned dense vectors (i.e., dense retrieval) rather than lexical signals (i.e., conventional retrieval) are increasingly popular. Their ability to identify related documents that do not necessarily contain the same terms as those appearing in the user's query (thereby improving recall) is one of their key advantages. However, to actually achieve these gains, dense retrieval approaches typically require an exhaustive search over the document collection, making them considerably more expensive at query-time than conventional lexical approaches. Several techniques aim to reduce this computational overhead by approximating the results of a full dense retriever. Although these approaches reasonably approximate the top results, they suffer in terms of recall -- one of the key advantages of dense retrieval. We introduce 'LADR' (Lexically-Accelerated Dense Retrieval), a simple-yet-effective approach that improves the efficiency of existing dense retrieval models without compromising on retrieval effectiveness. LADR uses lexical retrieval techniques to seed a dense retrieval exploration that uses a document proximity graph. We explore two variants of LADR: a proactive approach that expands the search space to the neighbors of all seed documents, and an adaptive approach that selectively searches the documents with the highest estimated relevance in an iterative fashion. Through extensive experiments across a variety of dense retrieval models, we find that LADR establishes a new dense retrieval effectiveness-efficiency Pareto frontier among approximate k nearest neighbor techniques. Further, we find that when tuned to take around 8ms per query in retrieval latency on our hardware, LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.

information retrieval, ladr, machine learning, (20 more...)

doi: 10.1145/3539618.3591715

2307.16779

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > District of Columbia > Washington (0.04)
(8 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

arXiv.org Artificial IntelligenceJun-5-2023

BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs

Yang, Zhen, Huang, Tinglin, Ding, Ming, Dong, Yuxiao, Ying, Rex, Cen, Yukuo, Geng, Yangliao, Tang, Jie

In-Batch contrastive learning is a state-of-the-art self-supervised method that brings semantically-similar instances close while pushing dissimilar instances apart within a mini-batch. Its key to success is the negative sharing strategy, in which every instance serves as a negative for the others within the mini-batch. Recent studies aim to improve performance by sampling hard negatives \textit{within the current mini-batch}, whose quality is bounded by the mini-batch itself. In this work, we propose to improve contrastive learning by sampling mini-batches from the input data. We present BatchSampler\footnote{The code is available at \url{https://github.com/THUDM/BatchSampler}} to sample mini-batches of hard-to-distinguish (i.e., hard and true negatives to each other) instances. To make each mini-batch have fewer false negatives, we design the proximity graph of randomly-selected instances. To form the mini-batch, we leverage random walk with restart on the proximity graph to help sample hard-to-distinguish instances. BatchSampler is a simple and general technique that can be directly plugged into existing contrastive learning models in vision, language, and graphs. Extensive experiments on datasets of three modalities show that BatchSampler can consistently improve the performance of powerful contrastive models, as shown by significant improvements of SimCLR on ImageNet-100, SimCSE on STS (language), and GraphCL and MVGRL on graph datasets.

artificial intelligence, batchsampler, machine learning, (14 more...)

2306.03355

Country:

North America > United States > California > Los Angeles County > Long Beach (0.05)
Asia > China (0.04)
Oceania > Australia > South Australia (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)